16 Support Vector Machines

Algorithmics 2025

Area of Study 3: Computer science: past and present

Learning Intentions

Key knowledge

  • the concept of training algorithms using data
    • the concepts of model overfitting and underfitting
    • support vector machines (SVM) as margin-maximising linear classifiers, including:
      • the geometric interpretation of applying SVM binary classification to one- or two-dimensional data
      • the creation of a second feature from one-dimensional data to allow linear classification

Key skills

  • explain, at a high level, how data-driven algorithms can learn from data
    • explain the optimisation objectives for training SVM and neural network binary classifiers
    • explain how higher dimensional data can be created to allow for linear classification

Machine Learning Algorithms

A machine learning algorithm is a procedure that allows a computer to improve its performance at a task by learning from data, rather than being given only explicit, hand-coded instructions.

  • It takes examples (data) as input.
  • It uses a model to find patterns or rules in that data.
  • It can then make predictions or decisions on new, unseen inputs.

Machine Learning Algorithms

  • Traditional algorithms: every step is written out by a programmer.

  • Machine learning algorithms: the computer adjusts its own internal rules (parameters) automatically, based on training data.

  • Examples:

    • Neural network – adjusts weights between “neurons” to recognise patterns.
    • Support vector machine (SVM) – finds the best boundary (hyperplane) to separate categories.
  • The machine can adjust its own parameters but it does not create them.

Support Vector Machines (SVMs)

  • A support vector machine (SVM) is a supervised machine learning algorithm.
  • Its main purpose is classification, especially binary classification
    • Email filtering (spam / not spam).
    • Image recognition (cat / not cat).
    • Medical diagnostics (disease / no disease).

Feature extraction or vectorisation

  • Features are measurable properties of the data (e.g. word counts, colours, weights).
  • Classification is assigning the data to a category based on those features.

Task: classify an email as spam or not spam.

  • Features might include:
    • Count of special words (e.g. “$$$”, “win”, “free”)

    • Number of links

    • Length of the email

    • Sender’s domain

    • Feature vector: \(\mathbf{x} = (3, 1, 0, 4)\)

Training the SVM

Training involves comparing a large set of preclassified vectors.

  • The SVM looks for the best separating boundary (called a hyperplane) between the two classes of data.

  • It chooses the hyperplane that maximises the margin

    • the margin is the distance between the hyperplane and the closest data points

    • closest data points are called support vectors.

Bias and Variance in Classification

Two types of errors when classifying data:

Bias - underfitting 🎯

  • Analogy: arrows clustered together but far from the bullseye
  • Comes from a too simple model
  • Misses the real patterns
  • Leads to systematic error (underfitting)

Variance - overfitting 🎯

  • Analogy: arrows scattered widely around the target
  • Comes from a too complex model
  • Fits the noise as well as the signal
  • Leads to unreliable predictions (overfitting)

The Trade-off

  • High bias → underfitting
  • High variance → overfitting
  • Goal = balance → arrows tightly grouped around the bullseye

Support Vector Machines

Key Vocabulary for SVM

  • Support vector – the data points that are closest to the separating boundary; they determine the position of the hyperplane.
  • Margin – the distance between the separating hyperplane and the nearest support vectors; SVM maximises this.
  • Hyperplane – the boundary SVM draws to separate the classes (a line in 2D, a plane in 3D, etc.).
  • Bias – error caused by using a model that is too simple (underfitting).
  • Variance – error caused by a model that is too complex and too sensitive to training data (overfitting).